Initialization of Iterative Refinement Clustering Algorithms
نویسندگان
چکیده
Iterative refinement clustering algorithms (e.g. K-Means, EM) converge to one of numerous local minima. It is known that they are especially sensitive to initial conditions. We present a procedure for computing a refined starting condition from a given initial one that is based on an efficient technique for estimating the modes of a distribution. The refined initial starting condition leads to convergence to “better” local minima. The procedure is applicable to a wide class of clustering algorithms for both discrete and continuous data. We demonstrate the application of this method to the Expectation Maximization (EM) clustering algorithm and show that refined initial points indeed lead to improved solutions. Refinement run time is considerably lower than the time required to cluster the full database. The method is scalable and can be coupled with a scalable clustering algorithm to address the large-scale clustering in data mining.
منابع مشابه
A robust iterative refinement clustering algorithm with smoothing search space
Iterative refinement clustering algorithms are widely used in data mining area, but they are sensitive to the initialization. In the past decades, many modified initialization methods have been proposed to reduce the influence of initialization sensitivity problem. The essence of iterative refinement clustering algorithms is the local search method. The big numbers of the local minimum points w...
متن کاملMaximin Initialization for Cluster Analysis
Most iterative clustering algorithms require a good initialization to achieve accurate results. A new initialization procedure for all such algorithms is given that is exact when the data contain compact, separated clusters. Our examples use c-means clustering.
متن کاملTR-2011002: Symbolic Lifting for Structured Linear Systems of Equations: Numerical Initialization, Nearly Optimal Boolean Cost, Variations, and Extensions
Hensel’s symbolic lifting for a linear system of equations and numerical iterative refinement of its solution have striking similarity. Combining the power of lifting and refinement seems to be a natural resource for further advances, but turns out to be hard to exploit. In this paper, however, we employ iterative refinement to initialize lifting. In the case of Toeplitz, Hankel, and other popu...
متن کاملSymbolic Lifting for Structured Linear Systems of Equations: Numerical Initialization, Nearly Optimal Boolean Cost, Variations, and Extensions
Hensel’s symbolic lifting for a linear system of equations and numerical iterative refinement of its solution have striking similarity. Combining the power of lifting and refinement seems to be a natural resource for further advances, but turns out to be hard to exploit. In this paper, however, we employ iterative refinement to initialize lifting. In the case of Toeplitz, Hankel, and other popu...
متن کاملAn improved opposition-based Crow Search Algorithm for Data Clustering
Data clustering is an ideal way of working with a huge amount of data and looking for a structure in the dataset. In other words, clustering is the classification of the same data; the similarity among the data in a cluster is maximum and the similarity among the data in the different clusters is minimal. The innovation of this paper is a clustering method based on the Crow Search Algorithm (CS...
متن کامل